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ABSTRACT 

We present the first results of the application of supervised classification methods to the Kepler 
Ql long-cadence light curves of a subsample of 2288 stars measured in the asteroseismology 
program of the mission. The methods, originally developed in the framework of the CoRoT 
and Gaia space missions, are capable of identifying the most common types of stellar variability 
in a reliable way. Many new variables have been discovered, among which a large fraction 
are eclipsing/ellipsoidal binaries unknown prior to launch. A comparison is made between our 
classification from the Kepler data and the pre-launch class based on data from the ground, 
showing that the latter needs significant improvement. The noise properties of the Kepler data 
are compared to those of the exoplanet program of the CoRoT satellite. We find that Kepler 
improves on CoRoT by a factor 2 to 2.3 in point-to-point scatter. 

Subject headings: methods: data analysis — methods: statistical — (stars:) binaries: eclipsing — stars: 
variables: other — techniques: photometric 
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The Kepler satellite, launched in March 2009 
is NASA's first mission cap able of finding Earth 



size and s maller planets ( Borucki et al. 2010t 
" 20101) . It has an 0.95-m aperture 



Koch et al 



Schmidt telescope with a photometer comprised 
of 42 CCDs having a fixed field of view of 105 
square degrees in the constellations Cygnus and 
Lyrae. It is designed to monitor continuously the 
brightness of 160 000 stars during the first year, 
reduced to 100 000 stars later in the mission. This 
results in high-quality light curves, not only in- 
teresting for the detection of planets, but also of 
great importance for asteroseismology. 

In this Letter, we present data from the astero- 
seismology program o f the NASA Kepler Mission 



([Gilliland et alj|2010j) . We present a search for 



variable stars and the application of supervised 
classification methods to the Kepler long-cadcncc 
Ql data of its asteroseismology program, cover- 
ing 33.5 days in total. All the light curves have 
29.4-min time sampling. Both the total time span 
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and sampling are very well suited to study short 
period eclipsing and ellipsoidal binaries, classical 
pulsators such as RRLyr stars and Cephcids, and 
nonradial pulsators such as /3Cep stars, Slowly 
pulsating B stars (SPBs), SSct stars and 7 Dor 
stars (see lAerts et al. ( 201dh for a definition of all 
these classes). We used the stellar fluxes as they 
were de livered to us after preliminary data pro- 
cessing ( Jenkins et al.l[201o( ). In total, we analyzed 
2288 Kepler light curves. 

We compare our results to a pre-launch classi- 
fication based on data in the Kepler Input Catalog 
(KIC hereafter) and prepared by the Kepler As- 
teroseismic Science Consortium (KASC, Gilliland 
et al. 2010). Finally, we compare the point-to- 
point scatter of the Kepler data to the one of 
CoRoT's exoplanet data of the first long run (5 
months) of that mission. 



ties along the corr e spond ing root-to-leaf path (see, 
e.g., Riplev et al. ( 19961 ) for a general introduc- 
tion and definition of tree-structured classifiers). 
In practice, this works as follows for our applica- 
tion: the probability of a stellar target being an 
RRLyr star of type ab is, e.g., the probability of 
not being an eclipsing binary (first stage) times 
the probability of belonging to the RR Lyr group 
(second stage) times the probability of being an 
RRLyr star of type ab. 

This procedure was followed to compute class 
probabilities for each target. In order to ob- 
tain the best candidates, we additionally used 
the Mahalanobis distance, which is a multi- 
dimensional generalization of the one-dimensional 
statistical or standard distance as described in 
Debosscher et al. ( 20091 ) . A visual check has been 



performed as well. 



2. Adopted Methodology 

To detect and extract the variables, we relied on 
the automated variability characterization method 
as de scribed in detail in iDebosscher et ah ( 2007 . 
20091 ). This method searches for three indepen- 
dent frequencies for every star, which are used to 
make a harmonic best-fit to the trend-subtracted 
time series. In this way we obtain a homogeneous 
set of parameters for each star, irrespective of its 
variability nature. The goal is not to achieve a 
good light curve model, but rather to deduce a 
set of light curve parameters which is sufficient to 
classify the variability. 

Using this set of parameters, we classified 
the stars using a modified version of the clas- 
sifier based on Gaussian mixtures, described in 



Debosscher et al.l (12007 ). with the definition stars 



from Debosscher et al. (12009D . We improved 
the performance of th e algorithm presented in 
Debosscher et al. ( 20091 ) by classifying the objects 
using a multi-stage tree. In each node, we decide 
which groups of stars we want to distinguish. The 
best parameters are then selected for that node 
and for each group the parameter distribution is 
approximated with a mixture of multi-dimensional 
Gaussians. To each variable stellar target we as- 
sign a probability that it belongs to a particular 
group. 

In order to obtain the final probability for 
each variability class we multiply the probabili- 



3. Classification results 

The variability classes we currently take into 
account, and the number of good candidate class 
members, are listed in TableHJ While we classified 
more than 200 stars securely, the majority of tar- 
gets still has a too ambiguous class assignment, 
mainly due to the limited time base and to the 
large fraction of red giants among the sample (see 
below). Nevertheless, we managed to identify nu- 
merous new pulsators and binaries from the short 
time series and from early data reduction. All the 
illustrations presented in this paper contain stars 
that were either unknown as variables or were mis- 
classified prior to launch. 

We evaluated our classification results by visual 
inspection and manual analysis of the best can- 
didates based on the Mahalanobis distance, and 
by comparing them with the pre-launch classifica- 
tion which is often insecure due to limited ground 
data. The results are summarized in Tabled] In 
the second and third column, a Mahalanobis dis- 
tance smaller than 3 and 2 is taken, respectively, 
without visual inspection. In the fourth column 
the final numbers of candidates are given, taking a 
Mahalanobis distance less than 3 and performing 
an additional visual inspection. In the last col- 
umn, the pre-launch classification done by KASC 
members is given. Note that several targets occur 
in more than one pre-launch class. For the ma- 
jority of the classes in TablelU our results appre- 
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ciably improve the pre-launch results, which were 
necessarily based on ground-based data and could 
not take into account Kepler's high quality light 
curves. Good agreement is obtained for binaries 
and classical pulsators such as RRLyr stars and 
Cepheids, but even there, we are able to improve 
the pre-launch results as some RR Lyr candidates 
were reclassified by us as eclipsing binaries. One 
example is given in the third panel of Fig.[TJ 

We could also identify many new (eclipsing) bi- 
naries, some of which are shown in Fig.[T] An ex- 
ample of a red giant pulsator with solar-like oscil- 
la tions in an ecl i psing binary is discussed in detail 



(|2010h , iBedding et ail (|2010h and IStello et al 



20101 ) for a study of solar-like pulsators among 



(|2010h 

For main-sequence nonradial pulsators, such as 
/? Cep, 5 Set, SPB and 7 Dor stars, there is much 
discrepancy between the pre-launch and our clas- 
sification. Few of the pre-launch candidates turn 
out to be actual class members. Indeed, we do 
not find high-probability candidates among these 
stars with a pre-launch class assignment. On the 
other hand, we identified new nonradial pulsators 
not present in the pre-launch lists. Some examples 
are shown in Fig. [2] Similar light curves, albeit for 
fainter st ars, were found in the C0R0T exoplanet 
database ( Degroote et al. 20091 e.g.). 

We have not yet been able to compare our re- 
sults for the long period variables along or past 
the Asymptotic Giant Branch, such as Miras or 
RVTau stars, since the current total time span of 
the light curves is only a fraction of the typical 
pulsation periods of those objects. Moreover, we 
did not yet search for short-period pulsators, such 
as solar-like pulsators along the main sequence, 
rapidly oscillating Ap stars, subdwa rf OB vari 



ables and white dwarf pulsators (see lAerts et al 



( 2010l ) for class definitions), given that we do not 
yet have short-cadence data. 

Another point of attention is the classification 
of solar-like pulsators. Stochastic pulsations are 
more easily recognized from a broad power excess 
than from the methodology adopted here. Indeed, 
the automated selection of the three highest fre- 
quency peaks will almost always be peaks due to 
granulation and/or background noise. Thus, to 
find such pulsators, one better uses an extractor- 
type approach. This involves fitting and subtrac- 
tion of the granulation and background signal to 
characterize the type of star, after which the oscil- 
lations can be sought. We refer to IChaplin et al 



Kepler targets. 

Finally, we stress that our classifiers solely 
use the information contained in the Kepler light 
curves. This implies that we cannot discriminate 
well between the class pairs of B-type j3 Cep and 
A-typc S Set stars with periodicities of the order 
of hours, and B-type SPB versus F-type 7 Dor 
stars with oscillation periods of days. Both pairs 
of classes contain light curves with very similar 
characteristics. To discriminate between them, at 
least some spectral information, such as a properly 
dereddened B-V color index or a stellar spectrum, 
is needed. According to the Initial Mass Func- 
tion, most of these candidates should be AF-type 
pulsators. 

4. Noise properties of the Kepler data in 
the asteroseismology program 

Figure[3]reveals the point-to-point scatter in the 
Kepler da ta of its asteroseismol ogy program as de- 
scribed in iGilliland et all(|2010h . as well as a com- 
parison with that for the C0R0T space mission's 
exoplanet data for its first long run of 5 m onths 
(|Auvergne et al.l 120091: [Aigrain et ai1l2009h . We 
rescaled the C0R0T data to the same integration 
time of 29.4 min as for the long-cadence mode of 
Kepler in order to ensure an appropriate com- 
parison. Duty cycles of these C0R0T and Kepler 
data are some 90% and 99%, respectively. It can 
be seen that th e Kepler data with still prelimi- 
nary processing ([Jenkins et al.ll2010h already out- 
performs the C0R0T exoplanet data by a factor 
~2 for objects with a visual magnitude of around 
14 to a factor of ~2.3 for objects of magnitude 
around 16. This is more or less as expected based 
on the difference in aperture size of the two in- 
struments. Importantly for follow-up studies of 
the most interesting variable stars, the Kepler as- 
teroseismology sample focuses on brighter objects. 

Typical noise levels of the least variable stars in 
the Kepler asteroseismology program, estimated 
as the average amplitude in the Fourier transform 
avoiding the low-frequency regime below 1 cycle 
per day, range from 1.3 ^mag for an 8th magni- 
tude star to 34 /miag for a 16th magnitude star. 
More than 70 % of the KASC stars are variable 
in one way or another, even taking into account 
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Fig. 1. — Some newly discovered eclipsing binaries, which were not identified as such prior to launch. The 
pre-launch KASC class is given on the right. 
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Fig. 2. — Some examples of multiperiodic pulsators, not present in the pre-launch class lists of pulsators. 
The pre-launch KASC class is given on the right. 
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Table 1 

Stellar variability classes considered in this work. See Chapter 2 in IAerts et al.I ( 2010| ) 
for a definition of the classes. md stands for the mahalanobis distance as defined in 

DeBOSSCHER ET AlI (|2009h ■ A HYPHEN '-' INDICATE VARIABILITY TYPES NOT TAKEN INTO ACCOUNT 

IN OUR CLASSIFICATION SCHEME. 
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residual instrumental effects. Note that this is 
a much higher fraction than for the CoRoT ex- 
oplanct programme, because the KASC target se- 
lection was aimed at focusing on variable stars 
while the CoRoT cxoplanct sample is unbiased 
with respect to variability. 

5. Conclusions 

We presented the first results of the applica- 
tion of automated supervised classification meth- 
ods to 2288 Kepler light curves. Comparison with 
existing pre-launch classification and manual clas- 
sification of the light curves shows the capabili- 
ties of our methodology: we are able to improve 
the pre-existing classification results seriously, and 
to identify new class members, unknown prior to 
launch. 

We will repeat the classifications as more and 
longer time-span light curves become available. 
This way, we will also be able to identify vari- 
ables with longer periodicities and have much bet- 
ter capacity to unravel beat periods in multiperi- 
odic pulsators. More classes will be included in 
the classification scheme, and, even more impor- 
tantly, the class definition stars will be updated 
using the high quality Kepler light curves them- 
selves. Access to 1-min cadence data will allow us 
to classify shorter period variables, in additional to 




Fig. 3. — Point-to-point scatter of the long- 
cadence l ight curves in Kepler 's asteroseismology 
program dGilliland et al.ll201Ct black) compared to 
the one of the light c urves in the CoRoT LR cOl ex- 
oplanet programme ( Auvergne et al.ll2009l gray) 
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those presented here. With each new set of data, 
our results will be updated and made available. 
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